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Abstract 

Many state-of-the art visualization techniques must be tailored to the specific type of dataset, its 
modality (CT, MRI, etc.), the recorded object or anatomical region (head, spine, abdomen, etc.) and 
other parameters related to the data acquisition process. While parts of the information (imaging modal- 
ity and acquisition sequence) may be obtained from the meta-data stored with the volume scan, there 
is important information which is not stored explicitly (anatomical region, tracing compound). Also, 
meta-data might be incomplete, inappropriate or simply missing. 

This paper presents a novel and simple method of determining the type of dataset from previously 
defined categories. 2D histograms based on intensity and gradient magnitude of datasets are used as 
input to a neural network, which classifies it into one of several categories it was trained with. The 
proposed method is an important building block for visualization systems to be used autonomously by 
non-experts. The method has been tested on 80 datasets, divided into 3 classes and a "rest" class. 

A significant result is the ability of the system to classify datasets into a specific class after being 
trained with only one dataset of that class. Other advantages of the method are its easy implementation 
and its high computational performance. 

Keywords: volume visualization, 3D datasets, 2D histograms, neural networks, classification. 



1 Introduction 

Volume visualization techniques have seen a 
tremendous evolution within the past years. 
Nevertheless, the users of volume visualization 
systems, which are mainly physicians or other 
domain scientists with only marginal knowledge 
about the technical aspects of volume render- 
ing, still report problems with respect to usabil- 
ity. The overall aim of current research in the 
field of volume visualization is to build an inter- 
active rendering system which can be used au- 
tonomously by non-experts. 

Recent advances in the field of user interfaces 
for volume visualization, such as [11 J and [12J 
have shown that semantic models may be tai- 
lored to the specific visualization process and 
the type of data in order to meet these require- 
ments. The semantic information is built upon 
a priori knowledge about the important struc- 
tures contained in the dataset to be visualized. A 
flexible visualization system must thus contain 
a high number of different semantic models for 
the huge variety of different examination proce- 
dures. 

An important building block for an effective 
volume rendering framework is a classification 
technique which detects the type of dataset in 
use and automatically applies a specific seman- 
tic model or visualization technique. For exam- 
ple, some methods are created specifically for 
visualizing MRI scans of the spine or CT scans 
of the head, and those methods rely on the actual 
dataset being of that type (i.e. its modality and 
its anatomical region). 

The prior knowledge required for select- 
ing an appropriate visualization technique in- 
cludes imaging modality, acquisition sequence, 
anatomical region, as well as other parameters 
such as chemical tracing compound. That is be- 
yond the information stored in the file system or 
the meta-data, therefore we propose a technique 
which classifies the datasets using a neural net- 
work which operates on statistical information, 
i.e. on histograms of the 3D data itself. 

The remainder of the paper is structured as 
follows: In the next section we review related 
work important to our paper. Section[3]describes 
our proposed method for automatic classifica- 
tion of 3D datasets. In Section |4] we describe 



the test environment our solution was integrated 
in. Section [5] presents and discusses the results 
of our approach and Section |6] concludes the pa- 
per. 

For unfamiliar readers, a nice (and relatively 
short) introduction to feed-forward neural net- 
works is presented by Svozil et al. [14J. 

2 Related work 

The 2D histogram based on intensity and gradi- 
ent magnitude was introduced in a seminal pa- 
per by Kindlmann and Durkin [|71, and extended 
to multi-dimensional transfer functions by Kniss 
et al. [8J. Lundstrom et al. [lOJ introduced lo- 
cal histograms, which utilize a priori knowledge 
about spatial relationships to automatically dif- 
ferentiate between different tissue types. Sereda 
et al. [16J introduced the LH histogram to clas- 
sify material boundaries. 

Tzeng et al. [fT5l| suggest an interactive visual- 
ization system which allows the user to mark re- 
gions of interest by roughly painting the bound- 
aries on a few slice images. During painting, the 
marked regions are used to train a neural net- 
work for multi-dimensional classification. Del 
Rio et al. adapt this approach to specify trans- 
fer functions in an augmented reality environ- 
ment for medical applications [5J. Zhang et 
al. [17J apply general regression neural networks 
to classify each point of a dataset into a certain 
class. This information is later used for assign- 
ing optical properties (e.g. color). Cerquides 
et al. [|2]1 use different methods to classify each 
point of a dataset. They use this classifica- 
tion information later to assign optical prop- 
erties to voxels. While these approaches uti- 
lize neural networks to assign optical properties, 
the method presented here aims at classifying 
datasets into categories. The category informa- 
tion is subsequently used as an a priori knowl- 
edge to visualize the dataset. 

Liu et al. [9J classify CT scans of the brain 
into pathological classes (normal, blood, stroke) 
using a method firmly rooted in Bayes decision 
theory. 

Serlie et al. [fT3]| also describe a 3D classifi- 
cation method, but their work is focused on ma- 
terial fractions, not on the whole dataset. They 
fit the arch model to the LH histogram, parame- 



2 



terizing a single arch function by expected pure 
material intensities at opposite sides of the edge 
(L,H) and a scale parameter. As a peak in the 
LH-histogram represents one type of transition, 
the cluster membership is used to classify edge 
voxels as transition types. 

Ankerst et al. [HJ conduct classification by us- 
ing a quadratic form distance functions on a spe- 
cial type of histogram (shell and sector model) 
of the physical shape of the objects. 

3 Automatic Classification of Vol- 
ume Datasets 

The method described in this paper was mostly 
inspired by ifTSll . In ifTSll . neural networks are 
used to position "primitives" on the 2D his- 
togram in order to create transfer function aim- 
ing at an effective volume visualization. The 
method presented here is similar in the sense that 
it uses 2D histograms as inputs to neural net- 
works. 

One of the widely used visualization ap- 
proaches of 3D data today is direct volume ren- 
dering [6J by means of a 2D transfer func- 
tion. 2D transfer functions are created in re- 
spect to the combined intensity/derivative his- 
togram. Such histograms in turn may be viewed 
as grayscale images. All histograms of the same 
3D dataset type (like different CT scans of the 
thorax) look similar to human observers. Like- 
wise, histograms of different datasets types usu- 
ally look noticeably different (see Fig. [O]). Our 
method stems from this fact. 

Neural networks can easily be trained to ap- 
proximate many unknown functions for which 
we have observations in the form of input-output 
combinations. That makes neural networks suit- 
able for classifying input histograms into cate- 
gories. 

The straight-forward approach is to use the 
histogram pixels (normalized to the [0,1] range) 
as inputs to the neural network. On the output 
side, each output corresponds to one category. 
We take the outputs as representing the proba- 
bility of the input to belong to the corresponding 
category. Thus we have a /:-dimensional output 
for k categories. For example, assume that we 
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Figure 1: Some of the histograms. Each one of 
the first 3 rows represents one class. Histograms in 
the last two rows together represent the rest class. 

have the following [0, 1] normalized outputs 
for some input: 

0,893456 
0,131899 
0,044582 

we interpret them as the probabilities of the in- 
put belonging to respective category (category 
one - 89%, category two - 13% and category 
three - 4%). Notice that the actual outputs in 
general do not add up to 100%. 



^The activation function which is employed in the neu- 
ral network we used produces outputs in the convenient 
range [0, 1], so no additional normalization is necessary 
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In order to identify the most probable classifi- 
cation result, the output with maximum value is 
chosen. Therefore, this input would be classified 
as belonging to the category one. Fig. |2} [3] and 
|4] show actual outputs of a neural network (for 
easier discerning, descriptive names are given to 
the outputs). 

A training sample consists of the histogram 
input and the desired output vector. In the de- 
sired output vector, only the correct output cate- 
gory has value 1, while all the others have value 
0. 

In our implementation we chose the multi- 
layer perceptron (MLP), a type of neural net- 
work which is capable of performing the re- 
quired task. It is trained by the back-propagation 
algorithm. One major benefit of MLP is that ad- 
ditional outputs can be added fairly easily, while 
retaining the function of all the other outputs. 
Using some other types of neural networks a 
new neural network would have to be created 
and trained from scratch, wasting time when- 
ever a new category is added. Furthermore, 
this would cause differently randomized initial 
weights, thus leading to slightly different results. 
In our version, we only need to add weights be- 
tween the newly inserted neuron in the output 
layer and all neurons in the last hidden layer (see 
Fig.0. 



Hidden 
layer 




Figure 2: Adding an output preserves existing 
weights. The neural network depicted here is a 
very small example (compared to real examples) 
suitable for graphical representation and explana- 
tion. 



As feed-forward networks can approximate 
any continuous real function with as little as 
3 layers, we have only tested networks with 3 
and 4 layers. Fewer number of layers can be 
compensated with a larger number of neurons 
in the hidden layer(s). Although some differ- 
ences exist (see fT,'?!), they are not relevant for 
this method (see Fig. |5]). All the results (ex- 
cept Fig. [5]) presented here are obtained using 
a 3 layer neural network. 

3.1 Modeling the Rest Class 

There are two ways to deal with datasets that do 
not fall into any of the well-defined classes, i.e. 
the miscellaneous datasets. The first approach 
is to have a "rest class", to which all of these 
datasets are associated. The second approach as- 
sumes that elements from the rest class usually 
do not strongly activate any of the outputs, often 
having value of the maximum output around 0,5 
(50%). So the second approach uses a thresh- 
old for successful classification: If the value of 
the maximum output is below that threshold, the 
dataset fails being classified into any of the well- 
defined classes and it is considered to be part of 
the rest class. 

From a conceptual point of view, the threshold 
approach is independent from the rest-class ap- 
proach, i.e. each of the concepts can be applied 
separately. From a practical point of view, both 
approaches are not completely independent: the 
better trained the rest class is, the less effect 
thresholding provides. Furthermore, providing a 
high amount of training samples to the rest class 
affects the reliabilit)|^of the classification of the 
normal (well-defined) classes. If this is coupled 
with a high threshold, a lot of "false negatives" 
emerge (datasets misclassified as belonging to 
the rest class instead of a well-defined class). 
However, applying both approaches is benefi- 
cial for lower amounts of training samples for 
the rest class. 

3.2 Performance issues 

If we directly use histogram pixels as the net- 
work's inputs, we have a large number of inputs. 



^Reliability in this context is the value of the maximum 
output 
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Figure 5: Raw outputs of the network without the rest class. Trained with 3 samples per class. 
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Figure 6: Using 4-layer neural network does not significantly improve results. Only the value of the maximum 
output is shown for each dataset. Abbr.: ts/cl - training samples per class. 



e.g. for a 256*256 histogram we get 64K[j in- 
puts. If the second layer contains 64 neurons, 
the number of weights between 1st and 2nd layer 
is 4M. In our implementation, the weights are 
32-bit floats, which leads to 16MB just for the 
storage of the weights between the 1^^ and the 
2^^ layer. The amount of weights between other 
layers is significantly smaller, due to the much 
lower number of neurons in these layers. 

However, the overall memory consumption is 
relatively exhaustive. Furthermore, the training 
gets very slow, and an alternative persistent stor- 
age on a hard disk would not be convenient due 
to slow reading, writing and data transfer. 







Figure 7: Size reduction. Upper left is the original 
256x256, lower right is 8x8 

Therefore, we incorporated a downscaling 
scheme for the histograms by rebinning. This 
does not only greatly reduce the required data, 
but it also significantly eliminates small details 
present in the histograms. For every dataset. 



^prefixes K and M here mean 2^^ and 2 



-)20 



their exact positions are always different, so they 
are only an obstacle for comparison purposes. 

For simplicity, our implementation only al- 
lows reduction by factors that are powers of 
2. That is: - no reduction, 1 - reduction to 
128x128, 2 - reduction to 64x64, etc. Most 
of the tests have been conducted with reduction 
factor 3 (histogram size 32x32). 

4 Testing environment 

The implementation of the described method 
is done in a visualization tool called Open- 
QVis. OpenQVis focuses on real-time visualiza- 
tion, relying on the features of modern graphics 
cards [6J. 

OpenQVis has different "models" of transfer 
functions, which are used to visualize different 
types of 3D datasets. Examples are: CT angiog- 
raphy of the head, MRI scans of the spinal cord, 
MRI scans of the head, and so on. These models 
were considered as classes for our method. 

OpenQVis allows the user to navigate to a 
model list and to choose one for the currently 
opened dataset. If the chosen model is not in the 
list of the output classes, a new output class is 
added to the neural network and the network is 
re-trained with this new training sample. If the 
chosen class is already present in the outputs, 
the network is re-trained with this new training 
sample included. If the histogram of the cur- 
rently opened dataset exists among the training 
samples, the sample is updated to reflect the new 
user preference. 
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Saving training samples with the neural net- 
work data is required because each re-training 
consists of many epochs, and if only the newest 
sample is used the network gradually "forgets" 
previous samples, which is, of course, unde- 
sired. So, all saved samples are used for each 
epoch in the re-training process. 

For testing purposes, we had three series 
available: 

1. Computed tomography - angiography of 
the head (CTA_*), 23 datasets 

2. Magnetic resonance images of the head, 
both preoperative and inter-operative 
(MR_*), 15 datasets 

3. Magnetic resonance - constructive interfer- 
ence in the steady state, mostly scans of the 
spine (mr_ciss_*), 19 datasets 

Furthermore, we had 23 miscellaneous datasets 
(almost all freely available on the Internet). 2 of 
those datasets were synthetic (bucky and tenta- 
cle), generated directly from computer 3D mod- 
els and not acquired by means of a scanning de- 
vice. 

This method can differentiate between cases 
within the same scanning modality. We tested 
this with available but confidential CTA heart 
datasets, which were clearly discernible from 
CTA head datasets. 

5 Results 

The classification based on our neural network 
approach takes, depending on histogram reduc- 
tion factor, mere microseconds. The training 
takes milliseconds for the reduction factor 4 and 
below. The training for the reduction factor 3 
takes noticeable fractions of a second (0,2s to 
0,6s) in our tests, and for the reduction factor 2 it 
takes seconds (3-10 seconds). The training time 
variations result from the termination condition. 
We set the condition MS^0,003 which was 
nearly almost met before the maximum number 
of epochs was reached. 

The reliability of classification is directly as- 
sociated with the reduction factor. As seen on 

^MSE = Mean Squared Error 



Fig. [7} the reliability decreases as the histogram 
size decreases. 

The choice of the dataset which is used to 
represent a class influences the results to some 
degree (see Fig. [8]). This influence affects the 
classification outcome only in the miscellaneous 
group, i.e. the rest class. Choosing an average- 
looking histogram for the training, or average 
and extremes in a case of more training sam- 
ples per class, results in a higher reliability of the 
classification and in more uniform output values 
across all datasets of that class. 

A slight variation of the results with respect to 
the initial randomization of the neural network 
exists, but is negligible. After training the net- 
work with one sample of each type, the aver- 
age difference in outputs (due to different ini- 
tial weights) is around 1%. The maximum for 
any dataset is 5%. These differences get smaller 
with a greater number of training samples. 

With the rest-class approach, all of the mis- 
classifications occur in the miscellaneous group 
(see Tab. |0]). This means, for example, that 
no CTA is classified as anything else other 
than CTA. Only datasets from the miscellaneous 
group are wrongly classified as something else 
(CTA, MR, or mr_ciss). This is true even if the 
neural network is trained with only one sample 
of each type. 

The thresholding approach has a lower 
amount of misclassifications in the miscella- 
neous group, but it misclassifies some datasets 
of the other classes ("false negatives"). 

From Fig. |2} [3| |4] and Tab. [O] it can be easily 
concluded that the threshold is a tweaking pa- 
rameter. Therefore, it should be set high only in 
specific situations, and in most cases it should 
be set to a more conservative value (50%-70%). 

Training with multiple datasets of specific 
classes improves the reliability. Training with 
multiple datasets of the rest class lowers mis- 
classification rate (see Fig. [9]). 

An alternative method for classifying 3D 
datasets would be to use a downscaled version 
of the dataset itself instead of the 2D histogram 
as input to the neural network. This alterna- 
tive, however, would strongly incorporate ge- 
ometric aspects, like the individual orientation 
of the recorded specimen into the classification 
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Figure 8: Downscaling the histogram images on the input side of the neural network reduces it's reliability 
and, in extreme cases, disables the neural network from delineating datasets. Only values of correct outputs are 
shown - if desired classification for some dataset is "default", value of default output is shown even if it is the 
lowest- valued output (such case is a misclassification). 
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Figure 9: Choosing different datasets for training the neural network influences the results. Trained with 1 
sample per class. Only values of correct outputs are shown. 
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Figure 10: Increasing the amount of training samples improves the results. Only values of correct outputs are 
shown. 
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Approach/Setup 


Misclassifi 
some rest 


cation rate 
rest some 


no rest class, no threshold 





all(23) 


with rest class, 1 ts/cl 





15-20 


with rest class, 2 ts/cl 





10-15 


with rc, 2 ts/wdc and 6 ts/rc 





3-5 


no rest class, threshold 50% 





10-15 


no rest class, threshold 70% 


0-1 


5-10 


no rc, threshold 90%, 1 ts/cl 


20-25 


1-2 


no rc, threshold 90%, 2 ts/cl 





3-5 


w. rc, threshold 50%, 1 ts/cl 





5-15 


w. rc, threshold 70%, 1 ts/cl 


0-5 


2-10 


w. rc, threshold 90%, 1 ts/cl 


25-30 


0-2 


w. rc, threshold 90%, 2 ts/cl 


0-5 


0-2 


w. rc, th. 90%, 2 ts/wdc and 6 ts/rc 


5-10 






Table 1: Comparison of misclassification rates 
for different setups of the classifier. If not spec- 
ified, the classification has been performed using 
varying parameters in terms of number of train- 
ing samples per class (for some setups) or choice 
of datasets used for training, resulting in slightly 
different misclassification rates, "rest some" 
means that a member of the rest class was wrongly 
classified as a member of a "well-defined class". 
Abbreviations: w. - with, ts - training sample(s), 
cl - class, rc - rest class, wdc - well-defined class, 
th. - threshold. 

process. As a result, the training phase would 
become more difficult, more training samples 
would be required, and the number of input 
nodes will increase considerably to achieve a ro- 
bustness comparable to the described histogram 
method. 

6 Conclusion 

We have presented a robust technique to auto- 
matically classify 3D volume datasets according 
to the acquisition sequence, the recorded speci- 
men and sequence-related parameters. The fact 
that only one training sample of certain type 
is sufficiently to properly classify all the other 
datasets of the same type is remarkable. De- 
pending in what type of visualization system this 
method is used, the end user might not need to 
know anything about it. 

Depending on the amount of information 
about the data and the application scenario, 
the architecture of the neural network can be 
adapted to better suite typical use cases (we took 
a general approach here). 

Majority of misclassifications are caused by 

^with the rest-class approach 



the datasets belonging to the miscellaneous 
group. As researchers, we had many different 
miscellaneous datasets readily available. How- 
ever, in production systems number of datasets 
in the rest class should be comparably smaller, 
thus making this method more appropriate. 

An additional advantage of this method is its 
easy implementation. Successful implementa- 
tions may be based on one of the many free neu- 
ral network implementations around (for exam- 
ple, on SourceForge). As a result, the benefits of 
including this method in a production visualiza- 
tion system (if suitable) will easily outweigh the 
implementation costs. 
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